2020-08-05

Onchocerciasis

  • Disease caused by a filarial nematode Onchocerca voluvlus
  • It is common in tropical and subtropical areas of Africa and some parts of South America.
  • At least 17 million people are infected globally and 198 milllion are at the risk of infection (NTD Modelling Consortium Onchocerciasis Group, 2019).
    Global burden and clinical manifestations

    Global burden and clinical manifestations

Life cycle of O. volvulus

  • It needs two different host to complete it’s life cycle: humans and black flies (Simulium sp.)
    Life cycle of _O. volvulus_[@http://zotero.org/users/2873801/items/YB2QJQP7]

    Life cycle of O. volvulus(Basáñez et al., 2016)

  • Ivermectin is the only drug used for treatment.

Modeling in onchocerciasis

  • Because of the complexity of life cycle, inability of O. volvulus to be grown in vitro, mathematical models are important to study the parasite’s biology and the disease epidemiology (Basáñez et al., 2016).
    Modellling study showing the effect of frequency of ivermectin treatment on microfilarial prevalence [@http://zotero.org/users/2873801/items/J94E2TLU]

    Modellling study showing the effect of frequency of ivermectin treatment on microfilarial prevalence (Hamley et al., 2019)

Rationale: Why geospatial model?

  • Existing models for onchocerciasis assume closed and homogeneous populations.
  • Recent epidemiological mapping study shows onchocerciasis prevalence across Africa is heterogeneous and patchy.
    Onchocerciasis prevalence map [@http://zotero.org/users/2873801/items/KLY4PY5R]

    Onchocerciasis prevalence map (Zouré et al., 2014)

  • With onchocerciasis control progressing towards elimination, geospatially explicit models are more important.

Project Aims

  • Aim 1: To develop geo-spatial modeling framework for analysis of onchocerciasis prevalence
    • Identify different types of data needed for the analysis
    • Determine different ecological, socio-demographic factors driving onchocerciasis epidemiology
  • Aim 2: To investigate methods for incorporating vector and parasite genetic data in geospatial models
    • Determine ecological factors affecting vector and parasite population structure
    • Infer migration pattern and dispersal of vector populations using landscape genetics analysis
  • Aim 3: Modeling different scenarios like effect of drug intervention and vector control at different geospatial scale

Expectations

  • An updated spatio-temporal prevalence map for Ethiopia and other African regions depending on data availability.
  • Identification of ecological factors driving the vector and parasite population distribution, and thus, also onchocerciasis prevalence.
  • A method to incorporate genetic data into geospatially explicit model for onchocerciasis.
  • A tool to monitor and formulate strategies for onchocerciasis elimination campaign.

Project progress

Aim 1

Geospatial modelling framework for prevalence data

  • Identified sources of data needed
    • Prevalence data (systematic literature search, relevant public health institutes)
    • Climate and environmental data (Worldclim, SEDAC, NOAA, satellite data repository)
    • Genetic data (lab repository)
  • Explored two different geospatial modeling framework for prevalence data
    • Machine learning approach: Random forest algorithm
    • Bayesian approach: Integrated Nested Laplace Approximation (INLA)

Data sources for the prototype geospatial model

  • Ethiopian prevalence data from publicly available database (Hill et al., 2019).

Onchocerciasis prevalence data from Ethiopia used for analysis

Environmental and climate data

Raster layer of some of the covariates masked to the border of Ethiopia

Raster layer of some of the covariates masked to the border of Ethiopia

  • We can analyse which of these variables contribute to high prevalence and use it to predict prevalence in other locations.

1. Random Forest Model

  • Random forest is a machine learning technique which can easily handle multidimensional data.
  • Usually used for image and text classification problems, it has also been extended for spatial analysis.
  • Spatial dependency on data accounted by incorporating buffer distances to the sample locations.

Prevalence prediction with Random Forest Model

Predicted median prevalence with Random Forest Model

2. Bayesian Approach: INLA

  • Allows to incorporate prior knowledge about the parameter in the form of probability distribution
  • Number of cases (\(Y_i\)) observed out of the total number of people tested (\(N_i\)) were assumed to follow binomial distribution \[ Y_i|P(\boldsymbol{x}_i) \sim Binomial(N_i, P(\boldsymbol{x}_i)) \]
  • Log odds of prevalence was modeled as \[ logit(P(\boldsymbol{x}_i)) = \beta_0 + \mathbf{X_i}^\intercal \mathbf{\beta} + S(\boldsymbol{x}_i). \]
  • \(S(\cdot)\) is a spatial random effect with Matérn covariance function.

Prevalence prediction with INLA Model

Mean prevalence map generated from the INLA model

  • The Great Rift valley appears to be the major geographical barrier influencing onchocerciasis epidemiology

Summary

  • Prevalence can be predicted on the locations where data are not reported.
  • We can find geographical barriers and ecological factors that can influence the prevalence.
  • Both random forest and INLA approaches are feasible, but random forest is poor in extrapolation.
  • INLA approach is computationally demanding.
  • I plan to go forward with both of these approaches and apply it depending on the nature of the data that will be available in future.

Next steps

  • Collate prevalence data at a greater spatial and temporal coverage.
  • Prepare additional covariates reflecting information about river flow, temporal covariates on climate and socio-demographic data.
Example spatio-temporal map at different time slices

Example spatio-temporal map at different time slices

Next steps (contd.)

  • Estimating epidemiologically relevant parameters from genetic data.
    • Identify environmental factors affecting population structure of vectors and parasites.
    • Create a connectivity and resistance surface map which might provide insight about the migration patterns of vector populations.
  • Expanding the current empirical geospatial model to a dynamic model which will provide greater flexibility to model different intervention scenarios.

Gantt chart

Timeline for the project

Timeline for the project

Acknowledgement

  • Assoc. Prof. Warwick Grant
  • Dr. Shannon Hedtke
  • Dr. Karen McCulloch
  • Dr. Joel Miller
  • Dr. Rebecca Chisholm
  • The Grant Lab members

References

Basáñez, M. G., Walker, M., Turner, H. C., Coffeng, L. E., de Vlas, S. J., & Stolk, W. A. (2016). River Blindness: Mathematical Models for Control and Elimination. Advances in Parasitology, 94, 247–341. https://doi.org/10.1016/bs.apar.2016.08.003

Hamley, J. I. D., Milton, P., Walker, M., & Basáñez, M.-G. (2019). Modelling exposure heterogeneity and density dependence in onchocerciasis using a novel individual-based transmission model, EPIONCHO-IBM: Implications for elimination and data needs. PLOS Neglected Tropical Diseases, 13(12), e0007557. https://doi.org/10.1371/journal.pntd.0007557

Hill, E., Hall, J., Letourneau, I. D., Donkers, K., Shirude, S., Pigott, D. M., Hay, S. I., & Cromwell, E. A. (2019). A database of geopositioned onchocerciasis prevalence data. Scientific Data, 6(1), 67. https://doi.org/10.1038/s41597-019-0079-5

NTD Modelling Consortium Onchocerciasis Group. (2019). The World Health Organization 2030 goals for onchocerciasis: Insights and perspectives from mathematical modelling. Gates Open Research, 3, 1545. https://doi.org/10.12688/gatesopenres.13067.1

Zouré, H. G., Noma, M., Tekle, A. H., Amazigo, U. V., Diggle, P. J., Giorgi, E., & Remme, J. H. (2014). The geographic distribution of onchocerciasis in the 20 participating countries of the African Programme for Onchocerciasis Control: (2) pre-control endemicity levels and estimated number infected. Parasites & Vectors, 7(1), 326. https://doi.org/10.1186/1756-3305-7-326

Thank you

Omitted slides

Effect of covariates

  • Importance of covariates can be assessed with the variable importance plot
    Variable importance plot for covariates in the random forest model

    Variable importance plot for covariates in the random forest model

Effect of covariates (contd)

  • Linear regression analysis was done to assess relationship between covariates and the predicted prevalence
    Linear regression model for covariates and the predicted prevalence

    Linear regression model for covariates and the predicted prevalence

Posterior probability distribution of effect parameter of covariates

Posterior probability distribution of effect parameter of covariates

Posterior probability distribution of effect parameter of covariates

Random Forests: Cross validation

  • Model selection with k-fold cross validation approach.
    Five fold cross validation for model validation and selection

    Five fold cross validation for model validation and selection

  • Root mean square error (RMSE) and R-squared values were calculated for each model

Random Forest Model selection

Random Forest Model: Prediction error

  • Prediction error was calculated from the upper and lower limit of predicted prevalence

    The prediction error is higher in the locations where predicted prevalence is higher

Comparison of predictions between Random forest model and INLA model

  • Correlation between the predicted and observed was better for Random forest model (97%) compared to the INLA model (89%).
Scatter plot for the observed and predicted prevalence for the Random forest and the INLA model

Scatter plot for the observed and predicted prevalence for the Random forest and the INLA model

Selection of covariates

  • Hierarchical clustering algorithm was used to select most representative covariates
    Dendrogram from the clustering analysis showing different cluster of covariates

    Dendrogram from the clustering analysis showing different cluster of covariates

  • List of 5, 10 and 15 cluster of covariates were generated
  • Potential influence of covariates (distance to river, rural urban extent) on onchocerciasis prevalence was also considered

Prediction error: Random Forest and INLA approach

We can quantify the uncertainties in prediction error map and identify where we need more sampling to increase predictive accuracy.